Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
很少有视觉识别是指从一些标记实例中识别新颖的视觉概念。通过将查询表示形式与类表征进行比较以预测查询实例的类别,许多少数射击的视觉识别方法采用了基于公制的元学习范式。但是,当前基于度量的方法通常平等地对待所有实例,因此通常会获得有偏见的类表示,考虑到并非所有实例在总结了类级表示的实例级表示时都同样重要。例如,某些实例可能包含无代表性的信息,例如过多的背景和无关概念的信息,这使结果偏差。为了解决上述问题,我们提出了一个新型的基于公制的元学习框架,称为实例自适应类别表示网络(ICRL-net),以进行几次视觉识别。具体而言,我们开发了一个自适应实例重新平衡网络,具有在生成班级表示,通过学习和分配自适应权重的不同实例中的自适应权重时,根据其在相应类的支持集中的相对意义来解决偏见的表示问题。此外,我们设计了改进的双线性实例表示,并结合了两个新型的结构损失,即,阶层内实例聚类损失和阶层间表示区分损失,以进一步调节实例重估过程并完善类表示。我们对四个通常采用的几个基准测试:Miniimagenet,Tieredimagenet,Cifar-FS和FC100数据集进行了广泛的实验。与最先进的方法相比,实验结果证明了我们的ICRL-NET的优势。
translated by 谷歌翻译
量子系统的许多基本属性都被其哈密顿和基态捕获。尽管基态制备(GSP)具有重要意义,但对于大规模的哈密顿人来说,这项任务在经典上是棘手的。发挥现代量子机的力量的量子神经网络(QNN)已成为征服此问题的领先协议。因此,如何增强QNN的性能成为GSP中的关键主题。经验证据表明,具有手工对称的Ansatzes的QNN通常比不对称Ansatzes的QNN具有更好的训练性,而理论解释却没有被探索。为了填补这一知识差距,我们在这里提出了有效的量子神经切线核(EQNTK),并将这一概念与过度参数化理论联系起来,以量化QNNS趋向全球最佳OPTA的融合。我们发现,对称Ansatzes的进步归因于其较大的EQNTK值,其有效尺寸很小,这要求很少的参数和量子电路深度达到过度参数化的制度,允许良性损失景观和快速收敛。在EQNTK的指导下,我们进一步设计了一种对称修剪(SP)方案,可以自动从过度参数化和不对称的对称的ANSATZ量身定制对称的ANSATZ,以极大地提高QNN的性能,而汉密尔顿的显式对称信息是不可用的。进行了广泛的数值模拟,以验证EQNTK的分析结果和SP的有效性。
translated by 谷歌翻译
卷积神经网络(CNN)已被证明在肺结核检测领域非常有效。但是,现有的基于CNN的肺结核检测方法缺乏捕获长期依赖性的能力,这对于全局信息提取至关重要。在计算机视觉任务中,非本地操作已被广泛使用,但是对于3D计算机断层扫描(CT)图像,计算成本可能很高。为了解决这个问题,我们提出了一个长的短切片网络(LSSANET),用于检测肺结核。特别是,我们开发了一种称为长短切片组(LSSG)的新的非本地机制,该机制将紧凑的非本地嵌入分裂为一个短距离切片,分组为一和长距离切片。这不仅减轻了计算负担,而且还可以在切片和整个功能图中保持长期依赖性。提出的LSSG易于使用,可以插入许多肺结核检测网络中。为了验证LSSANET的性能,我们将基于2D/3D CNN的几种最近提出的竞争检测方法进行比较。大规模PN9数据集的有希望的评估结果证明了我们方法的有效性。代码在https://github.com/ruixxxx/lssanet上。
translated by 谷歌翻译
几乎没有零件分割的目的是仅给出几个带注释的样本,将对象的不同部分分开。由于数据有限的挑战,现有的作品主要集中在学习分类器上,而不是预先训练的功能,无法学习针对零件细分的任务特定功能。在本文中,我们建议在“预训练” - “微调”范式中学习特定于任务的功能。我们进行及时设计以减少预训练任务(即图像生成)与下游任务(即部分分段)之间的差距,以便可以利用生成的GAN先验进行分割。这是通过将零件分割图投影到RGB空间中并在RGB分割图和原始图像之间进行插值来实现的。具体而言,我们设计了一种微调策略,以逐步将图像发生器调整到分割生成器中,在该机构中,生成器的监督通过插值从图像到分割图各不等。此外,我们提出了一个两流体系结构,即一个分割流以生成特定于任务的特征,以及一个图像流以提供空间约束。图像流可以视为自我监管的自动编码器,这使我们的模型能够从大规模的支持图像中受益。总体而言,这项工作是试图通过及时设计来探索一代任务和感知任务之间的内部相关性。广泛的实验表明,我们的模型可以在几个部分分割数据集上实现最新性能。
translated by 谷歌翻译
尽管在预验证的GAN模型的潜在空间中表现出的编辑能力,但倒置现实世界的图像被陷入困境,即重建不能忠于原始输入。这样做的主要原因是,训练和现实世界数据之间的分布未对准,因此,对于真实图像编辑而言,它不稳定。在本文中,我们提出了一个基于GAN的新型编辑框架,以通过组成分解范式解决室外反转问题。特别是,在构图阶段,我们引入了一个差分激活模块,用于从全局角度\ ie(IE)检测语义变化,这是编辑和未编辑图像的特征之间的相对差距。借助生成的diff-cam掩模,配对的原始图像和编辑图像可以直观地进行粗糙的重建。这样,几乎整体可以生存属性,而这种中间结果的质量仍然受到不可避免的幽灵效果的限制。因此,在分解阶段,我们进一步提出了一个基于GAN的基于GAN的DEGHOSTING网络,用于将最终的精细编辑图像与粗糙重建分开。在定性和定量评估方面,广泛的实验比最新方法具有优势。我们方法的鲁棒性和灵活性在两个属性和多属性操作的方案上也得到了验证。
translated by 谷歌翻译
植物点云的分割以获得高精度的形态特征对于植物表型和作物育种至关重要。尽管深度学习方法的绽放促进了对植物点云的分割的大量研究,但大多数作品遵循基于硬素化或基于下采样的方法的共同实践。它们仅限于细分简单的植物器官,忽略了解决具有高空间分辨率的复杂植物点云的困难。在这项研究中,我们提出了一个深度学习网络分割变压器(PST),以实现MLS(移动激光扫描)油料种子强奸点云的语义和实例分割,该强奸点云将其特征在于微小的硅酸盐和致密点作为主要特征。 PST由:(i)一个动态体素特征编码器(DVFE),可通过原始空间分辨率进行每个点特征聚集; (ii)双窗口设置注意力块以捕获上下文信息; (iii)一个密集的特征传播模块,以获得最终的致密点特征图。结果证明,PST和PST-PointGroup(PG)在语义和实例分段任务中实现了最新性能。对于语义细分,PST分别达到93.96%,97.29%,96.52%,96.88%和97.07%的平均值,平均精度,平均召回率,平均F1得分和整体准确性。例如,在MCOV,MWCOV,MPERC90和MREC90中,分割的PST-PG分别达到89.51%,89.85%,88.83%和82.53%。这项研究以端到端的方式扩展了油料强奸的表型,并证明了深度学习方法具有巨大的潜力,可以理解具有复杂形态特征的密集植物点云。
translated by 谷歌翻译
To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.
translated by 谷歌翻译
在过去的十年中,卷积神经网络(Convnets)主导了医学图像分析领域。然而,发现脉搏的性能仍然可以受到它们无法模拟图像中体素之间的远程空间关系的限制。最近提出了众多视力变压器来解决哀悼缺点,在许多医学成像应用中展示最先进的表演。变压器可以是用于图像配准的强烈候选者,因为它们的自我注意机制能够更精确地理解移动和固定图像之间的空间对应。在本文中,我们呈现透射帧,一个用于体积医学图像配准的混合变压器-Cromnet模型。我们还介绍了三种变速器的变形,具有两个散晶变体,确保了拓扑保存的变形和产生良好校准的登记不确定性估计的贝叶斯变体。使用来自两个应用的体积医学图像的各种现有的登记方法和变压器架构进行广泛验证所提出的模型:患者间脑MRI注册和幻影到CT注册。定性和定量结果表明,传输和其变体导致基线方法的实质性改进,展示了用于医学图像配准的变压器的有效性。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译